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ABSTRACT 

We explore the bluer star-forming population of the Sloan Digital Sky Survey (SDSS) 
III/BOSS CMASS DRll galaxies at ^ > 0.55 to quantify their differences, in terms 
of redshift-space distortions and large-scale bias, with respect to the luminous red 
galaxy sample. We perform a qualitative analysis to understand the significance of 
these differences and whether we can model and reproduce them in mock catalogs. 
Specifically, we measure galaxy clustering in CMASS on small and intermediate scales 
(r < 50 h~^Mpc) by computing the two-point correlation function — both projected 
and redshift-space — of these galaxies, and a new statistic, X(7r), able to provide 
robust information about redshift-space distortions and large-scale bias. We inter¬ 
pret our clustering measurements by adopting a Halo Occupation Distribution (HOD) 
scheme that maps them onto high-resolution N-body cosmological simulations to pro¬ 
duce suitable mock galaxy catalogs. The traditional HOD prescription can be applied 
to the red and the blue samples, independently, but this approach is unphysical since 
it allows the same mock galaxies to be either red or blue. To overcome this failure, we 
modify the standard formulation and infer the red and the blue mock catalogs directly 
from the full one, so that they are complementary and non-overlapping. This sepa¬ 
ration is performed by matching the observed CMASS red and blue galaxy fractions 
and produces reliable and accurate models. 

Key words: galaxies: distances and redshifts — galaxies: haloes — galaxies: statistics 
— cosmology: observations — cosmology: theory — large-scale structure of Universe 


1 INTRODUCTION 

In the last decade, an enormous effort has been spent to ex¬ 
plore the formation and evolution of the large scale structure 
of our Universe. The standard cold dark matter (ACDM) 
model with cosmological constant, together with the the¬ 
ory of cosmic inflation, has become the leading theoretical 
picture in which structures can form, providing a clear pre¬ 
diction for their initial conditions and hierarchical growth 
through gravitational instability (e.g., |Primack||1997|) . Test¬ 
ing this model requires one to combine large N-body simula- 
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tions with measurements from last generation large-volume 
photometric and spectroscopic galaxy surveys, as the Sloan 
Digital Sky Survey (SDSS), ([York et al.||200dt [Gunn et al.| 
|2006t |Smee et '^|2013|) and the SDSS-III Baryon Oscillation 
Spectroscopic Survey (BOSS; ^isenstein et al.|p011t |Dawson| 
|et al.|[20T^ . In particular, BOSS has been able to measure 
the Baryon Acoustic Oscillation (BAO) feature in the clus¬ 
tering of galaxies and Lyman-a forest with unprecedented 
accura cy, by collecting spect ra of 1.5 million galaxies up to 
z=0.7 ([Anderson et al.|[2014[) , over a 10,000 deg^ area of sky, 
and about 160,000 Lyman-a forest spectra of quasars in the 
redshift range 2.2 < z < 3 (jSlosar et al.||201l]) . 

The ACDM paradigm claims that galaxies form at the 
center of dark matter halos, thus estimating the cluster- 
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ing features of such complex structures, is currently one of 
the main targets of modern cosmology ([Kravtsov Borgani] 
I 20 T 2 D . Despite the recent dramatic improvement in the ob¬ 
servational data, what primarily prevents us from achieving 
this goal immediately is the theoretical uncertainty of galaxy 
bias i.e., the difference between the distribution of galax¬ 
ies and that of the matter. Galaxies are treated as biased 
tracers of the underlying matter distribution, and observa¬ 
tions of their clustering properties are used to infer those 
cosmological parameters that govern the matter content of 
the Universe. In this context, the Halo Occupation Distribu¬ 
tion (HOD; [Berlind fc Weinberg||2002t [Kravtsov et al.||2004t 
[Zheng et al.[[2005| , [2007[) framework has emerged as a power¬ 
ful tool to bridge the gap between galaxies and dark matter 
halos, providing a theoretical framework able to characterize 
their mutual relation in terms of the probability, P(A/’|M), 
that a halo of virial mass M contains N galaxies of a given 
type. At the same time, it provides a robust prediction of 
the relative spatial and velocity distributions of galaxies and 
dark matter within halos. In this approach, the use of large- 
volume N-body cosmological simulations is crucial to pro¬ 
duce reliable maps of the dark matter sky distribution. 

In this work, we explore the red/blue color bimodal¬ 
ity observed in the CMASS sample of BOSS DRll ([Alam[ 
[et al.[[20T5[) galaxies. In order to quantify and model the dif¬ 
ferences between these two galaxy populations, we measure 
their clustering signal on small and intermediate scales, from 
r ~ 0.1 /i“^Mpc up to r ~ 50 /i“^Mpc. We compute the 
two-point correlation function (2PCF) — both projected 
and in redshift-space — of the BOSS CMASS galaxies, 
and a new metric, U(7r), designed to extract information 
about the small-scale nonlinear redshift-space distortion ef¬ 
fects. We then map our results to the MultiDark cosmolog¬ 
ical simulation ([Prada et al.[[20lH [Riebe et al.[[201l|) using 
an HOD approach ([Zheng et al.[[2007| ; [White et al.[[20lf|) , to 
generate suitable mock galaxy catalogs. In this context, we 
investigate whether we can find an HOD parametrization 
able to model both the blue and red observed clustering 
amplitudes, with small variations in its parameters. As an 
alternative to HOD models, one can interpret clustering ob¬ 
servations with an Halo Abundance Matching (HAM) pre¬ 
scription (e.g., [Trujillo- Gomez et al.[[2011t p^uza et al.[[2013[) 
with the advantage of avoiding free parameters, only as¬ 
suming that more luminous galaxies are associated to more 
massive halos, monotonically, through their number densi¬ 
ties. HAM is a straightforward technique that provides accu¬ 
rate predictions for clustering measurements; nevertheless, 
we choose to model our CMASS clustering measurements 
using a five-parameter HOD scheme because it is a gen¬ 
eral method, based on a halo mass parametrization, and 
does not require a specific luminosity (stellar mass) function 
([Montero- Porta et al.[[2014[) to reproduce the observations. 

Besides the traditional HOD approach, where each 
galaxy population has its own independent model defined by 
a different set of parameters, we test an alternative prescrip¬ 
tion, in which the red and the blue models are recovered by 
splitting the full mock catalog using suitable conditions to 
mimic the observed CMASS red and blue galaxy fractions, 
as a function of the central halo mass. In this way, the re¬ 
sulting mocks are no longer independent — they are based 
on the same HOD parameter set — and the total number 
of degrees of freedom is reduced from 15 (three independent 


models, with five parameters each) to 5 (full HOD) plus 2 
(galaxy fraction constraint). The main motivation of this 
new approach is that the classical HOD parametrization re¬ 
produces well the full CMASS population, and it provides 
non-physical predictions when applied to the red and blue 
sub-samples, independently. In fact, in the process of pop¬ 
ulating a halo with central and satellite galaxies, this kind 
of modeling allows the same galaxy to be either red or blue 
i.e., to be placed in halos with different masses. To over¬ 
come this problem, we adopt a new HOD formulation, in 
which the red/blue split observed in our data sample is used 
as a discriminant condition to perform an univocal galaxy 
assignment. 

We investigate the impact of redshift-space distortions 
on the clustering signal, both on small (1-halo term) and 
intermediate (2-halo level) scales. Our new metrics, U(7r), 
allows us to separate and quantify both the nonlinear elon¬ 
gation seen in the two-point correlation function below 
2 /i“^Mpc, and the Kaiser compression at scales beyond 
10 h~^Mpc. We model these effects in terms of two param¬ 
eters, A and C, respectively encoding the galaxy velocity 
dispersion with respect to the surrounding Hubble flow, and 
the linear large-scale bias. In agreement with several previ¬ 
ous works (see, for instance, [Wang et al.[[2007^ [Zehavi et al.[ 
[2005bt [Swanson et al.[[2008[) , we find that red galaxies are 
more clustered (i.e. higher peculiar velocity contribution) 
and biased, compared to their blue star-forming compan¬ 
ions. 

The paper is organized as follows. In Section [^ we in¬ 
troduce the methodology used to measure and model galaxy 
clustering in the BOSS CMASS DRll galaxy sample: we 
define the metrics we examine, the correlation function and 
the covariance estimators. We then provide an overview of 
the MultiDark simulation, we discuss the HOD formalism 
adopted to create mock galaxy catalogs, and introduce the 
analytic tools used to model both finger-of-god and Kaiser 
effects. In Section^ we present the CMASS DRll sample 
and the specific red/blue color selection used in our analysis, 
we illustrate how to weight the data to account for fiber col¬ 
lision and redshift failure effects, and outline the procedure 
adopted to generate randoms. In Section ^ we describe how 
we model our full CMASS clustering measurements building 
reliable mock galaxy catalogs that take into account the con¬ 
tribution of redshift-space distortions, and present the first 
results for the three metrics of interest: ^(s), Wp(rp), U(7r). 
In Section [^ we first apply the same procedure individually 
to the red and blue CMASS galaxy sub-samples to create 
their own independent mock catalogs; then, we propose an 
alternative method to separate the red and blue populations 
using, as a constraint, the observed CMASS red/blue galaxy 
fractions. Our data versus mock 5](7r) results, compared to 
the A, G analytic models are shown in Section^ Section 0 
reports our main conclusions. 

2 METHODS 

2.1 Clustering Measurements 

We quantify the clustering of galaxies by computing the two- 
point correlation function i.e., the excess probability over 
random to find a pair of galaxies typically parameterized 
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as a function of their co-moving separation (see, e.g., |Pee-| 
|bles||1980|) . The galaxy correlation function is well known to 
approximate a power-law across a wide range of scales, 

‘{^y. w 

where ro is the correlation length, and 7 is the power-law 
slope or spectral index. However, improved models (see re¬ 
view at [Cooray fc Sheth| | 2002 |) have been shown to better 
match the data (|Zehavi et al.|[2004|) . 

The redshift-space correlation function differs from the 
real-space one due to the distortion effects caused by our in¬ 
ability to separate the peculiar velocities of galaxies from 
their recession velocity when we estimate distances from 
the redshift. These distortions introduce anisotropies in the 
2PCF in two different ways. On large scales, where the lin¬ 
ear regime holds, galaxies experience a slow infall toward an 
over-dense region, and the peculiar velocities make struc¬ 
tures appear squashed in the line-of-sight direction, an ef¬ 
fect commonly known as “Kaiser compression” (E aise 3|T983 
Pamilton] |1998|) . At smaller scales, nonlinear gravitational 
collapse creates virialized systems and thereby relatively 
large velocity differences arise between close neighbors re¬ 
sulting in structures appearing significantly stretched along 
the line-of-sight (|Jackson||1972D . This effect is commonly re¬ 
ferred to as the “finger-of-god”(FoG). 

We are interested in using three related two-point 
clustering metrics: the redshift-space monopole, <f(s), the 
projected correlation function, Wp{rp), and a new line-of- 
sight focused measurement to capture small-scale redshift- 
space distortion effects, F]( 7 r), which we define below. In 
our formalism, s represents the redshift-space pair sepa¬ 
ration, while Vp and tt are the perpendicular and paral¬ 
lel components with respect to the line-of-sight such that 
+ TT^. We can parameterize the redshift-space cor¬ 
relation function as a function of redshift-space separation 
s or, equivalently, in terms of rp and tt. We can mitigate 
the impact of redshift-distortions by integrating along the 
line-of-sight to approximate real-space clustering (|Davis 
(Peebles!|1983|) in the projected correlation function, 

POO 

'^pUp) • (^) 

Jo 

This integration is performed over a finite line-of-sight dis¬ 
tance as a discrete sum. 


'^max 

'^pUp) ~ 2 ^ ^ ? ( 3 ) 

i 


where tti is the bin of the line-of-sight separation, 
and Am is the corresponding bin size. We use Timax = 
80 /i“^Mpc and Att = 10 /i“^Mpc. 

Since Wp(rp) is not affected by redshift-space distor¬ 
tions, the best fit power-law is equivalent to a real-space 
measurement. One can therefore quantify the deviation of 
the redshift-space ^(r^, tt) correlation function from the real- 
space behavior by measuring the ratio. 


S(7r) 




( 4 ) 


where ^(tt) is the best-fit power law to Wp{rp), evaluated at 
the TT scale, and fp indicates that we perform a spherical 


average in the range 0.5 ^ ^ 2 /i“^Mpc. This statistic il¬ 

luminates the nonlinear FoG effects by normalizing out the 
expected real-space clustering along the line-of-sight direc¬ 
tion. It is therefore preferable to measuring the quadrupole- 
to-monopole ratio, ^ 2 (s)/^o(s) (|Hamilton||l9^ [l998|; pea^ 
(cock et aL|| 2001 D , in the attempt to interpret the small-scale 
nonlinear redshift-space clustering effects. 

2.2 Correlation Function Estimation 

For our clustering statistics, we use the estimator of|C an 
|fe Szala3 (|1993|) : 

_ DD{s) - 2DR{s) + RR{s) 

- mi) 

where DD^ DR and RR are the data-data, data-random 
and random-random weighted pair counts computed from 
a data sample of N galaxies and a random catalog of Nr 
points. These pair counts are normalized by the number of 
all possible pairs, typically by dividing by N{N —1)/2, NNr 
and Nr{Nr — 1)/2, respectively, and weighted by ((Ross et ^ 

| 20 T^ 

DD(rp,7T) = EE Gij (rp, tt) ( 6 ) 

* j 

with wtot given by Eq. (|3^ , and Sij{rp,7v) represents a step- 
function which is 1 if Vp belongs to the and tt to the 
bin, and 0 otherwise. These weights correct the galaxy 
densities to provide a more isotropic selection, therefore they 
should not be applied to the random catalog, which is based 
on an isotropic distribution. For randoms wtoty — 'Wtotj = 1, 
therefore 

DR{rp,Tr) = '^'^wtot,iQij(rp,Tr) , (7) 

i j 

RR{rp,Tr) = 0jj(rp,7r) . (8) 

i j 

To evaluate the correlation function, we create a ran¬ 
dom catalog that has the same selection as the BOSS 
GMASS galaxy data matching both the redshift distribu¬ 
tion and sky footprint (see, e.g., (Anderson et al.||2014|) . The 
method of random catalog construction is almost identical 
to that described in (Anderson et al.( ((2014() , but constructed 
to be ten times as dense as the galaxy data. We down-sample 
random points based on sky completeness, and “shuffle” the 
observed galaxy redshifts assigning them to random sky po¬ 
sitions so as to exactly reproduce the observed redshift dis¬ 
tribution. 


2.3 Covariance Estimation 

To estimate the uncertainties in our clustering measure¬ 
ments, we utilize the jackknife re-sampling technique ((Que-( 
(nouille[(r95()[; (lurkey((l958[; [lVIiller((Tn74(; (Nor berg et al.((2nn9(, 
(201 1() . There are known limitations to this type of error esti¬ 
mation (see, e.g., (Norberg et al.((200^ , but they have proven 
sufficient in analyses on scales similar to our analysis ((Zehavi( 

mm 101:10 et a[mi2i hqss et aiimm 

(Anderson et al.((2012() . The jackknife covariance matrix for 
Nres re-samplings is computed by 

N — 1 

Cij = -yf — yy ia - my - ^, ( 9 ) 

-L^res 

a=l 
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where is the mean jackknife correlation function estimate 
in the specific bin, 


at z = 0.53. First, we populate distinct halos with central 
galaxies whose mean is given by the function form of: 


iV J-QQ 

= E (10) 

a = l 

The overall factor in Eq. ^ takes into account the lack of in¬ 
dependence between the Nres jackknife configurations: from 
one copy to the next, only two sub-volumes are different or, 
equivalently, Nres — 2 sub-volumes are the same (|Norberg| 

|eF^[20nD . 


2.4 The MultiDark Simulation 

MultiDark ([Prada et al.||20ll|) is a N-body cosmological sim¬ 
ulation with 2048^ dark matter particles in a periodic box 
of Lhox = 1 Gpc h“^ on a side. The first run, MDRl, was 
performed in 2010, with an initial redshift of z = 65, and 
a mass resolution of 8.721 x 10® h~^MQ. It is based on the 
WMAP5 cosmology ([Komatsu et al.| |2009|) , with parame¬ 
ters: flrn = 0.27, fib = 0.0469, Qa = 0.73, rig — 0.95 and 
as = 0.82. Here fl is the present day contribution of each 
component to the matter-energy density of the Universe; Ug 
is the spectral index of the primordial density fluctuations, 
and as is the linear RMS mass fluctuation in spheres of 
8 /i“^Mpc at z = 0. 

MultiDark includes both the Bound Density Maxima 
(BDM; [Klypin fc Holtzman||1997| ; [Riebe et al.||20llj) , and 
the Friends-of-Friends (FOF; [Davis et "^[19851) halo-finders. 
For the current analysis, we use only BMD halos that are 
identified as local density maxima truncated at some spher¬ 
ical cut-off radius, from which unbound particles (i.e., those 
particles whose velocity exceeds the escape velocity) are re¬ 
moved. According to the overdensity limit adopted, two dif¬ 
ferent BDM halo catalogs are produced: (i) BDMV — halos 
extend up to A^ir x ptack, where Ayir = 360 is the virial 
overdensity threshold, pback = flm x pc is the background 
or average matter density, and pc is the critical density of 
the Universe, (ii) BDMW — the maximum halo density is 
A 200 X Pc where A 200 = 200, which implies that BDMW 
halos are smaller than BDMV ones. The bound density 
maxima algorithm treats halos and sub-halos (those sub¬ 
structures whose virial radius lies inside a larger halo) in 
the same way, with no distinction. In this work we use the 
BDMW halo catalogs, since they resolve better the distribu¬ 
tion of sub-structures in distinct halos, leading to a clearer 
small-scale clustering signal. 


2.5 Halo Occupation Distribution Model using 
Subhalos 

The halo model (reviewed in [Cooray fc Sheth[[2002[) is a pow¬ 
erful tool to understand the clustering of galaxies. The Halo 
Occupation Distribution (HOD; [Berlind V Weinberg[[2002[) 
is a commonly used method of mapping galaxies to dark 
matter halos, which characterizes the bias between galaxies 
and the underlying dark matter distribution. The HOD is 
based on the conditional probability, P(V|M), that a halo 
with mass M contains N galaxies of a given type. In our 
analysis, we apply the five-parameter HOD formalism pre¬ 
sented in [Zheng et al.[ ([2007[) using the MDRl simulation 


(iVeen(M)) = - 


1 + erf 


log M — log Mrr 

aiogM 


( 11 ) 


where erf is the error function, erf (x) = 2 e~^ 

The free parameters are Mmin, the minimum mass scale 
of halos that can host a central galaxy, and aiog m , the width 
of the cutoff profile. At a halo mass of Mmin, 50% of halos 
host a central galaxy, which in terms of probability means 
that P(l) = 1—P(0). If the relation between galaxy luminos¬ 
ity and halo mass had no scatter, (Vcen(M)) would be mod¬ 
eled by a hard step function. In reality, this relation must 
possess some scatter, resulting in a gradual transition from 
iVcen ^ 0 to iVcen — 1- The width of this transition is aiogM- 
In order to place the satellite galaxies, we assume their num¬ 
ber in halos of a given mass follows a Poisson distribution, 
which is consistent with theoretical predictions ([Berlind V[ 
[Weinberg[2002[ ; [Kravtsov et al.[[2004t [Zheng et al.[[2005'|) . We 
approximate the mean number of satellite galaxies per halo 
with a power law truncated at a threshold mass of Mo 


(iVsat) = (iVeen(M)) (II^^) . (12) 

The parameter M[ corresponds to the halo mass where 
Nsat ~ I, when (as in our case) M[ > Mo and M[ > Mmin- 
When OL — \ and M > Mo, the mean number of satellites 
per halo is proportional to the halo mass. To populate with 
satellite galaxies, we randomly extract from each host halo 
a certain number of its sub-halos, following a Poisson dis¬ 
tribution with mean given by Eq. Hi The coordinates of 
these sub-halos become the locations for satellites. This ap¬ 
proach, explored in previous works as [Kravtsov et ah] ([2004[) , 
[White et al.[ ([20II[) , is intrinsically different from the more 
commonly used procedure, in which satellites are assigned 
by randomly assigning the positions of dark-matter parti¬ 
cles (see, e.g., [Reid fc Sperg^ |2009| ) . In our case, satellite 



logM;, 

Figure 1. Five-parameter Halo Occupation Distribution model 
for MDRl, at 2 : = 0.53. The parametrization is from [ZHeng et aT[ 
(|2007[) , and the input values from [White et al.| ([20111) . The total 
(solid line) population of galaxies is the sum of two contributions: 
central (dashed) and satellite (dot-dashed) galaxies. 


© 0000 RAS, MNRAS 000 , 000-000 





















































Understanding BOSS/CMASS Galaxies 5 


galaxies are assigned by reflecting the original halo structure 
made of one central halo plus none, one, or many sub-halos. 

Figure shows our HOD model built from MultiDark 
BDMW at z = 0.53, for the full CMASS sample: central 
galaxies are represented by the dashed curve; satellites are 
the dot-dashed line and the total contribution is the solid 
curve. As input parameters, we adopt the values consis¬ 
tent with the BOSS CMASS HOD modeling in [White et al.| 

(I 20 TTD . 


2.6 Analytic models 

[Kaiserj (jlQSTj) demonstrated that on large scales, where the 
linear regime holds, the redshift-space correlation function 
can be factorized in terms of its real space version, <f(r), as 

^(s) = ^(r) ^1 + ?/?+, (13) 

where jS is the Kaiser factor encoding the compression effect 
(Sec. g3) seen in the clustering signal and b is the linear 
bias between galaxies and the underlying matter distribu¬ 
tion. These two quantities can be related (e.g., |Peebles|ri980|) 
through the following approximation: 

13 ^ n°J/b. (14) 


In general, one can decompose the redshift-space sepa¬ 
ration s into its parallel and transverse components to the 
line-of-sight and approximate ^(r) with the power law in Eq. 
I^to produce (jMatsubara fc Suto||1996|) : 

f(rp, tt) = ^(r) 1 1 + 

^ (15) 

3 - 67/i^ + 7(2 + 7)14“^ 2! 

(3_7)(5_7) P 

Here 7 is the power law spectral index and fi is the 
cosine of the angle between the separation and the line-of- 
sight direction. We include the small-scale nonlinear FoG 
by convolving with a pairwise velocity distribution (|Fisher| 
jet al.j [1994^ |Hamilton| [1998^ [Groom et al.j [200 5[) , which can 
be modeled as an exponential, 

fexp{w) = ^-exp , ( 16 ) 

V2a V « / 

or a Gaussian form, 

, (17) 


where a is the pairwise velocity dispersion. The full model 
then becomes 

/ +00 

Grp,rz(w))f{w)dw , (18) 

-OO 

with ^{rp,rz{w)) given by EquationThe quantity 
rz{w) = (tt — w)/(all(z)) is the line-of-sight component of 
the real-space distance r, a = {1 z)~^ is the scale fac¬ 

tor, and H(z) is the Hubble parameter evaluated at redshift 
z. The full E( 7 r) analytic model, as a function of a and /3, 
is obtained by averaging Eq. in the range 0.5 ^ ^ 2 

/i“^Mpc and integrating the result in tt bins, as explained 
in Section o. 




Figure 2. S(7r) analytic model as a function of the pairwise ve¬ 
locity dispersion. A, (top panel) and the parameter G, encoding 
the Kaiser factor (bottom panel). Solid lines represent the Gaus¬ 
sian model given in Eq. 113 dashed curves are the exponential 
functions in Eq. We choose to model our E(7r) measurements 
using the normal functional form only, since it reproduces more 
accurately the small-scale feature provoqued by the FoG distor¬ 
tions and peak at larger scales. 


Gombining these definitions and matching the binning 
in Avp and Att, we have: 


S(7r) = 




r ^ r AR. ( ^ 

J Att J Avp J 


7/2 


(19) 


Finally, we rename the parameters a and 13 respectively A 
and G to emphasize they are fitted parameters that might 
differ slightly from their theoretically motivated meaning. In 
this formalism, Eq. simply becomes 


G c:. (20) 

The FoG and Kaiser effects could be overlapping and, 
as fit parameters in a model, they are correlated. The im¬ 
portance of our modeling is not to isolate their value, but to 
differentiate between models and data with sub-populations 
of galaxies. Figure @ shows how both effect contribute to 
modulate our E( 7 r) model. There is a degeneracy between 
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the parameter values, in the sense that both increasing A or 
reducing G produces an enhancement in the 5](7r) peak. This 
dependence prevents us from interpreting the G parameter 
as the only one responsible of the E(7r) amplitude. 

2.7 Fitting Wp{rp) 

To implement the integral in Eq. to estimate the projected 
correlation function Wp(rp), we need to truncate it at some 
upper value, TVmax, above which the contribution to correla¬ 
tion function becomes negligible. If one includes very large 
scales, the measurement will be affected by noise; inversely, if 
we consider only very small scales, the clustering amplitude 
will be underestimated. In our case, CMASS results are not 
sensitive to tt ^ 80 /i“^Mpc, therefore we adopt this value 
as our TTmax limit. The projected auto-correlation function 
is related to the real-space one by (|Davis fc Peebles] |1983|) 



IZehavi et al.| (|2005b|) demonstrates that for a generic power 
law, (f(r) = (r/ro)^, the equation above can be written in 
terms of the Euler’s Gamma function as 



allowing one to infer the best-fit power law for ^(r) from 
Wp{rp), corresponding to the full CMASS galaxy sample, 
blue and red sub-samples. Eigure presents the power-law 
fits to the full, red and blue CMASS projected correlation 
functions, and the resulting (ro,7) optimal values. 


3 BOSS CMASS DATA 

BOSS target galaxies primarily lie within two main samples: 
CMASS, with 0.43 < z < 0.7 and LOWZ, with z < 0.43 



Figure 3. Power-law fits to the CMASS full, red and blue pro¬ 
jected correlation functions, which define the denominator in Eq. 
m The ro and 7 values we find are consistent with IZehavi et al.| 
|2UU5a|, and show that red galaxies cluster more than blue star¬ 
forming ones. The error bars correspond the Icr uncertainties es¬ 
timated using 200 jackknife resamplings (Sec. |2.3|) . 


(|Ross et al.||2012t [Anderson et al.||2012t [Bolton et al.||2012'|) . 
These samples are selected on the basis of photometric ob¬ 
servations done with the dedicated 2.5-m Sloan Telescope 
([Gunn et al.[ [2006[) , located at Apache Point Observatory 
in New Mexico, using a drift-scanning mosaic CCD cam¬ 
era with five color-bands, ugriz ([Gunn et al.[[1998t [Fukugit^ 
[et al.[ [ 19 ^ . Spectra of the LOWZ and CMASS samples 
are obtained using the double-armed BOSS spectrographs, 
which are significantly upgraded from those used by SDSS- 
I/II, covering the wavelength range 3600 — lOOOOA with a 
resolving power of 1500 to 2600 ([Smee et al.[[2013j) . Spectro¬ 
scopic redshifts are then measured using the minimum-y^ 
template-fitting procedure described in [Aihara et al.[ ([2011[) , 
with templates and methods updated for BOSS data as de¬ 
scribed in [Bolton et al.[ ([2012[) . 

We select galaxies from CMASS DRll ([Alam et al.[ 
[2015[) - North plus South Galactic caps - which is defined 
by a series of color cuts designed to obtain a galaxy sample 
with approximately constant stellar mass. Specifically, these 
cuts are: 


17.5 < icmod < 19.9 

(23) 

'f'mod imod ^ 2 

(24) 

d^ > 0.55 

(25) 

ifib2 ^ 21.5 

(26) 

icmod < 19.86 + 1.6{d± - 0.8), 

(27) 


where icmod is the z—band cmodel magnitude. The quantities 
imod and Tmod are model magnitudes, ifih^i is the z—band 
magnitude within a 2” aperture and dx_ is defined as 

d±_ — Fmod '^mod i^dmod ^mod)/8'0. ( 2 ^) 

All the magnitudes are corrected for Galactic extinction us¬ 
ing the dust maps from [Schlegel et al.[ ([1998[) . In addition 
to the above color cuts, CMASS objects must also pass two 
star-galaxy separation constraints: 

ipsf — imod > 0.2 + 0.2(20.0 — imod) (29) 

^psf Zmod ^ 9.125 OAGZmod') (^0) 

unless the objects also pass the LOWZ criteria. Therefore, to 
distinguish CMASS from LOWZ candidates, it is necessary 
to select them by redshift. 

3.1 Color Selection 

The CMASS sample is mainly composed of massive, lu¬ 
minous, red galaxies, which are favorite subjects to study 
galaxy clustering. Among them, however, there is an intrin¬ 
sic bluer, star-forming population of massive galaxies (|Ross[ 
[et al.[[20^ [Cuo et ^[2012[) , of which little is known. In 
the attempt to explore this bluer component to understand 
its contribution in the clustering properties, we split the 
CMASS sample into its blue and red components by ap¬ 
plying the color cut 

°-^^{g-i) = 2.35 (31) 
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constant in redshift and K-corrected to the z = 0.55 rest- 
frame using the code by [Blanton fc Row^ (|2007|) . [Mas-j 
jters et aT| (pOllj) applied this same color cut, with no K- 
corrections, to the BOSS CMASS DR 8 sample to study the 
morphology of the LRG population; [Ross et HIM!) used 
a similar selection, —i) = 0.95, to measure galaxy clus¬ 
tering at the BAO scale in CMASS DRIO. Figure ^ presents 
our CMASS color selection, splitting the full sample into 
a red denser population (above the blue horizontal line) 
and a sparse blue tail (below the line), whose completeness 
dramatically increases when we move towards high redshift 
values {z > 0.55). For our analysis, we focus on the high- 
redshift tail of the CMASS sample, selecting only galaxies 
with redshift beyond z > 0.55. 


according to their number density, n{z). It is defined as 


wfkp = 


1 

1 + n{z)PFKP ’ 


(33) 


where Pfkp is a constant that roughly corresponds to the 
amplitude of the CMASS power spectrum P{k), at fe = 0.1 h 
Mpc“^. We assume Pfkp = 2 x 10 ^ Mpc“^, in [Andersonj 
jet al.|(| 20 T^ . The last weight, Wgys, accounts for a number of 
further systematic effects that could cause spurious angular 
fluctuations in the galaxy target density. These effects are 
treated in detail in [Ross et ^ (|2012|) , but we do not include 
them in this analysis, since they are not relevant at the scales 
considered in this work. Therefore we set in Wgys = 1 in the 
following analysis. 


3.2 Weights 

Due to its structural features, a survey inevitably introduces 
some kind of spatial variation in its measurements. To avoid 
these distortions, we weight our pair counts by defining a 
linear combination of four different weights ([Anderson et al.j 
[20T2|; [SlE^ez et al.[[20T^ [Ross et al.1[20T^ : 

wtot = Wfkp Wsysiwfc + w^f - 1), (32) 

each one correcting for a different effect. In the expression 
above, Wzf accounts for targets with missing or corrupted 
redshift {z failure); Wfc corrects for fiber collision, com¬ 
pensating the fact that fibers cannot be placed closer than 
62” on the survey plates. This limitation prevents obtaining 
spectra of all galaxies with neighbors closer than this an¬ 
gular distance in a single observation. The default value of 
Wzf and Wfc is set to unity for all galaxies. When a fiber 
collision is detected, we increment by one the value of Wfc 
for the first neighbor closer than 62”. In the same way, for 
the nighbor we increase by one the value of Wzf of the near¬ 
est galaxy with a good redshift. To minimize the error in 
the measured clustering signal, we also require a correction 
based on the redshift distribution of our sample, namely the 
Wfkp factor ([Feldman et al.[[l994[) , that weights galaxies 



redshift 

Figure 4. BOSS CMASS DRll color selection: the {g — i) color 
cut divides the full sample into a red dense population (above the 
blue horizontal line) and a sparse blue tail (below the line). 


4 MODELING FULL CMASS SAMPLE 
4.1 Full CMASS Clustering 

We construct an HOD model using MultiDark halos and 
sub-halos (see model description in Section [23[) , and pro¬ 
duce a mock galaxy catalog which we compare to the full 
CMASS DRll population. This mock is built by varying the 
HOD parameters to match ^(s), populating the MD simu¬ 
lation in each step, and using the peculiar velocities in the 
simulation to model redshift-space distortions. The intention 
is that changing the HOD will constrain the overall galaxy 
bias, hence we fit only one statistic. We then evaluate and 
further investigate these fits over the three clustering met¬ 
rics: ^(s), Wp{rp) and S( 7 r). 

However, since implementing a formal fit to determine 
the optimal HOD parameters is beyond the scope of this 
work, we improve the matching empirically, changing the 
input values until we find a suitable (log M^in, Tfo, ol ^ 
c’^iogAf) set that reproduces the observed ^(s) amplitude. We 
fit only Mmin (the minimum halo mass), M[ (the mass scale 
of the satellite cut-off profile) and a (the satellite slope). The 
remaining parameters are fixed to their default values given 
by [White etH] ([20111) : log Mo = 12.8633, aiogM = 0.5528. 



Figure 5. Redshift-space monopole correlation functions of our 
2; = 0.53 MultiDark full mock galaxy catalog (solid line) compared 
to BOSS CMASS DRll measurements. Error bars are estimated 
using 200 jackknife regions. 
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The specific choice of these three parameters arises from 
their connection to two physical quantities we want to mea¬ 
sure: (i) the satellite fraction, fsat, that controls the slope of 
the 1-halo term at small scales, where sub-structures of the 
same halo dominate; (ii) the galaxy number density, n(z), 
affecting the 2-halo term at larger scales, where correlations 
between sub-structures of different hosts become apprecia¬ 
ble. Figure in the Appendix illustrates how a change in 
Mmin, and a affects the projected correlation function. 

Figure 1^ displays the redshift-space monopole corre¬ 
sponding to our empirical best fit = 11.08/7 do f in¬ 
cluding the full covariance matrix computed with jackknife; 
the HOD parameters are given in Table mock galaxy cat¬ 
alog from the MultiDark simulation. The projected corre¬ 
lation function, Wp(rp), and the line-of-sight statistic, 5](7r), 
corresponding to this model are shown in Figure In agree¬ 
ment with many previous works (|Zehavi et al.||2004| , |2005bt 
|Guo et aL||2012D , we find that CMASS galaxies are more 
highly clustered at small scales (1-halo regime); then, as the 
spatial separation between the pairs increases, the cluster¬ 
ing strength drops (2-halo term). Compared to [White et al.| 




Figure 6. Projected correlation function (top) and S(7r) (bot¬ 
tom) for the 2 : = 0.53 MultiDark full mock galaxy catalog (solid 
line), compared to BOSS CMASS DRll measurements. Error 
bars are estimated using 200 jackknife regions containing the same 
number of randoms. 


(|2011|) , our best-fit mock has a much lower satellite slope, 
a, and M{, resulting in a higher satellite fraction (about 
27%); however, our mean satellite occupation function is 
compatible with results from |Guo et aL| (|2015|) . Overall, 
the amplitude of our model galaxies is in good agreement 
with observations. Error bars are estimated using 200 jack¬ 
knife regions gridded in right ascension and declination as 
follows: 10RAxl5DEC cells for the CMASS North Galac¬ 
tic Cap {Nres = 150), plus 5RAxlODEC regions for the 
South Galactic Cap, {Nres = 50). This approach produces 
200 equal areas of about 100 deg^ each. 

In the calculation of the full CMASS (MD mock) E(7r) 
through Eq. ^ we use the best-fit power-law to the full 
CMASS (MD mock) Wp{rp). The relative ro and 7 esti¬ 
mates are given in Eigure|3[ Beyond 8—10 /i“^Mpc, where 
the Kaiser squashing becomes predominant, the jackknife 
uncertainties on E(7r) are wider. This measurement reveals 
that the deviation of ^(fp,7r) from the real-space behavior 
dramatically changes according to the scale of the prob¬ 
lem: at very small redshift separations, tt ^ 2 /i“^Mpc, 
where the finger-of-god dominate, the contribution of pe¬ 
culiar velocities pushes E(7r) below unity. Above 3 /i“^Mpc, 
E(7r) increases sharply and peaks around 8 h~^Mpc. On 
larger scales, the correlation between pairs of galaxies is 
compressed along the line of sight since the Kaiser infall 
dominates and E(7r) drops. 

4.2 Modeling Redshift-Space Distortions and 
Galaxy Bias 

In redshift-space, two different distortion features are ob¬ 
served: the finger-of-god effect which dominates below 
2 /i“^Mpc, and the Kaiser flattening, which becomes impor¬ 
tant beyond 10 — 15 /i“^Mpc. These phenomena preferen¬ 
tially manifest themselves on different scales, but a certain 
degree of entanglement is unavoidable in both regimes. In 
order to better separate the two effects, we examine E(7r) in 
our MultiDark full mock catalog in three different configura¬ 
tions: real-space, redshift-space with only Kaiser effect and 



Figure 7. S(7r) in real-space (dot-dashed line), redshift-space 
with only Kaiser contribution (dashed) and Kaiser plus finger-of- 
god (solid). As expected, the real-space behavior is close to unity 
at all scales. 
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Figure 8. S(7r) full CMASS DRll measurement (left panel) and our MultiDark 2 ; = 0.53 mock (right panel), versus their A, G 
analytic model (dashed lines). For both data and mock sets we assume the errors are given by our jackknife estimate, computed using 
200 resamplings. The fits are performed by using the full covariance matrix. These plots reveal that the full CMASS sample and the 
MultiDark model galaxies share almost the same large-scale bias value, while the peculiar velocity contribution is higher in the mocks. 


full redshift-space (FoG+Kaiser), as shown in Figure]^ The 
real-space 5](7r) is defined in Eq. ^ omitting the peculiar ve¬ 
locities both in the numerator and in Wp(rp) to which we fit 
the power law at the denominator. Since E(7r) is the ratio 
between two spherically averaged power laws, we expect it 
to be close to unity at all scales. Hence, the dot-dashed line 
in Figure 0 is compatible with expectations. The redshift- 
space case with only Kaiser contribution (dashed line) is 
computed by assigning satellite galaxies their parental Vpec 
value. In this way, each satellite shares the coherent motion 
of its parent, but it does not show any random motion with 
respect to it. The last case considered is the full redshift- 
space E(7r) (solid line), in which satellite galaxies have their 
own peculiar velocity, which is independent from their par¬ 
ents. 

We are now able to provide a full description of our E(7r) 
results by modeling them through Eq. EE in terms of four 
parameters: the power-law correlation length, ro, its slope 
7, the pairwise velocity dispersion, A and the G parameter, 
which is inversely proportional to the linear galaxy bias, 5, 
through Eq. m 

The linear galaxy bias is scale dependent and has been 
computed (e.g., P^uza et al.||2013|) as the ratio between the 
galaxy and matter correlation functions, 

" /S' 

Our goal is to provide an estimate of both the peculiar 
velocity field causing the distortions we observe in redshift- 
space in our clustering measurements and the large scale 
bias, using the A, G values we find from our full, red and blue 
CMASS and MultiDark E(7r) modeling. To this purpose, we 
do not compute the bias as P^uza et “ahl (|2013|) , through Eq. 
PH but we estimate it from Eq. 

Figure 0 displays the A, G models (dashed curves) for 
our CMASS measurements (left panel, squares) and full 
MultiDark mock catalog (right panel, crosses). All the model 
fits are performed including the full covariance matrix, esti¬ 


mated by using 200 jackknife re-samplings (Sec. E3- For the 
MultiDark mock, we assume the same scatter of the CMASS 
data. 

Adopting a normal function (Eq. E3) to mimic the con¬ 
tribution of peculiar velocities (see Table , results in the 
MD model galaxies that have slightly higher bias — which 
means a lower G value — than the full CMASS popula¬ 
tion and higher peculiar velocity contribution — higher A 
value. This result is in agreement with the bottom panel 
of Figure 0: CMASS data points (diamonds) experience a 
stronger Kaiser squashing at ~ 10 Mpc h“^i.e., they have a 
smaller large-scale bias, compared to the MultiDark model 
galaxies (solid line). From these A,G values, we conclude 
that our full MD mock catalog can be considered a reliable 
representation of the full CMASS sample. 

The reduced x^ values we derive from the full CMASS 
and MultiDark E(7r) model hts are relatively high, compared 
to the x^ values we hnd for ^(s), which are reported in the 
caption of Table 0 The main reason for this result is the na¬ 
ture of E(7r), which is a “derived" clustering measurement, in 
the sense that it is built from the ratio (Eq. 0) of the 2PCF 

— spherically averaged in the range 0.5 ^ Vp < 2/i“^Mpc 

— over a real-space term. In order to mimic this average in 
our model, we must perform a double integration in (rp,7r) 
of the convolution (z.e. the inner integral in the numerator 
of Eq. [T^ of the real-space term with the peculiar velocity 
contribution, f{w). Such a double integration has to be com¬ 
puted numerically, in (rp,7r) bins. In this way, we eliminate 
the dependence on Vp — we remain with a single “mean" Vp 
value, representing the 0.5 — 2/i“^Mpc bin — and main¬ 
tain the TT dependence — we remain with a At’mean" tt 
value for each tt bin. Thus, the A, G model reproduces the 
S(7r) measurement in bins of (rp,7r) and not analytically in 
each point. This is a hrst approximation. 

Also, we assume for the peculiar velocity term, f{w), 
a Gaussian functional form (Eq. ED , but this is an arbi¬ 
trary choice, which introduces another approximation. In 
addition, the denominator in Eq JTH which is nothing but 
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the best-fit power law to lo’p(rp), spherically averaged in the 
way described above, presents the same numerical issues of 
the numerator. 

In conclusion, we are applying a series of approxima¬ 
tions that are necessary in order to define our 5](7r) model, 
but they unavoidably affect the accuracy of the fit. 

Since our goal is to give a qualitative estimate, in terms 
of linear bias and redshift-space distortions, of the full, red 
and blue CMASS samples, we do not heavily focus on the 
goodness of our E(7r) model fits, but instead we stress the 
importance of a cross-comparison, in terms of A, G values, 
between the three CMASS populations and the MultiDark 
model galaxies. In particular, for the full CMASS case h ~ 
3, which is relatively high, compared to the value 6 ~ 2, 
reported in [Nuza et al.| (|2013|) . This disagreement can be 
justified by recalling that we are selecting only the high- 
redshift tail of the CMASS sample, beyond z > 0.55, and for 
those galaxies the bias is expected to be higher as compared 
to the full CMASS bias in |Nuza et ahl (|2013D . 

4.3 Full CMASS Covariance 

We compute the full CMASS jackknife covariance matrix 
for the three metrics of interest using Eq. ^ in which ^ is 
either ^(s), iCp(rp), or E(7r). We estimate the goodness of 
our model fits to the CMASS measurements by computing 
the relative values as 

/ = A^C-^A, (35) 

where A = {Cdata - Cmodei) is a vector with i = I,..., no 
components and is an unbiased estimate of the inverse 
covariance matrix ([Hartlap et al.|[2007t [Percival et al.||20T4|) , 

CF = {l-D)C-\ D= (36) 

In the equation above, rih is the number of observations 
and Nres the number of jackknife re-samplings. For the full 
CMASS population, the correction factor {1 — D) represents 
a 8% effect on the final value. 

In Appendix 0 we test our jackknife error estimates 
using a set of 100 Quick Particle Mesh (QPM; [White et al.| 
poll galaxy mock catalogs. 


5 MODELING COLOR SUB-SAMPLES 

We repeat the same analysis described in Section ^ on the 
red and blue color sub-samples. We first use ^(s) to fit a 
HOD and match the overall clustering, then use our analytic 
model to obtain fits for A and G. There remains a question 
on how to model the sub-populations in the mocks; we ex¬ 
plore two methods. 

5.1 Independent Red and Blue models 

For simplicity, our first attempt at the color sub-samples is 
to individually model the red and blue CMASS populations. 
That is, we assume the clustering comes from a complete 
sample and we generate a HOD populating halos indepen¬ 
dently of whether a galaxy is red or blue. By definition, 
there is no connection in the overlap and the same halo or 



Total 

Red 

Blue 

log 

13.00 

13.10 

12.50 

\ogM[ 

13.30 

13.02 

13.85 

a 

0.20 

0.22 

0.15 

f sat 

0.27 

0.33 

0.11 

(logM© 

12.75 

13.00 

12.50 


Table 1. Our best empirical estimates of the HOD parameters 
for the total, red and blue independent models of the CMASS 
populations. We obtain these values only by fitting ^(s) with a 
three-dimensional grid in logM^^^, logM^ and a. The resulting 
values are: 11.08/7dof (full CMASS), 13.54/7dof (red) and 
14.91/7dof (blue). 


sub-halo could host either a red and blue galaxy in the cor¬ 
responding mocks. This is an over-simplified view, as clearly 
a galaxy can be either red or blue and not both. However, 
it is an assumption that is embedded within several related 
analyses (jZehavi et al.||2004| , |2005bt [Guo et ^|2012| , |2014|) . 

Figure ||| shows the agreement between the CMASS 
monopole, projected 2PCF and F(7r) measurements and 
our independent red and blue model galaxies. Our empir¬ 
ical best-fit HOD parameter values are reported in Table 
|T|, together with the satellite fraction; the fraction is higher 
for red than for blue galaxies, confirming that luminous red 
galaxies tend to live in a denser environment (jWang et al.j 
|2007t jZehavi et al.||2005bt [Swanson et al.||2008|) . We conclude 
that we are able to fit correctly all our red and blue CMASS 
clustering results, by means of the same HOD technique, 
with small variations in its input parameters. However, these 
red and blue independent models are non-physical, because 
they allow the same galaxy to be either red or blue. In other 
words, they place both red and blue galaxies in the same 
hosting halos, which is not the case. 

To overcome this problem, we propose an alternative halo 
occupation distribution approach (see next Section) in which 
the red and the blue models are obtained by splitting the 
full mock catalog into sub-populations that match the ob¬ 
served red/blue CMASS galaxy fractions. In this way, the 
red and blue model galaxies are no longer independent and, 
by construction, they cannot occupy the same positions in 
a given halo. 


5.2 Splitting Color Samples using Galaxy 
Fractions 

Inspired by the result in the previous section, we developed 
a more physically motivated model of red/blue color sep¬ 
aration. In line with the standard halo model, we explore 
a splitting method based entirely on host halo mass, with 
each of them matching the corresponding observed CMASS 
galaxy fraction. By modeling these red/blue fractions, fb,r, 
as a function of the central halo mass, we are able to cor¬ 
relate the red and the blue mock catalogs to the full one, 
reducing the number of free parameters from 15 (5 for each 
independent HOD) to 5 (full HOD) plus 2 (constraint on 
galaxy fractions). Our galaxy fraction model must verify two 
conditions: (i) to obtain reliable results, the models must 
reproduce the overall fb,r values observed in our CMASS 
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Figure 9. Independent mock catalogs designed to model CMASS DRll red and blue Wp{rp) and S(7r) measurements (points and 
squares). The error bars are the la regions estimated using 200 jackknife re-samplings of the data. Despite we fit only ^(s), we find good 
agreement between data and mocks in all our three statistics. As expected, red galaxies show a higher clustering amplitude compared to 
the blue population. 


red/blue selection; this is done by requiring that 


of the central halo mass, is 


E£i/,(logM4i))/iV = 0.25, 


fr{\ogMh) = 1 - fbilogMh) = 0.75 


(37) 


where we allow 20% of scatter, and (ii) the red (blue) frac¬ 
tion must approach zero at low (high) mass scales. We build 
our theory as a function of the central halo mass only, omit¬ 
ting the dependence on satellite masses. Despite this simpli¬ 
fying assumption, the resulting red and blue mocks match 
correctly the observed clustering amplitude. To mimic the 
red/blue split, we test different functional forms of fb,r, 
starting with a basic linear one (Figure dashed line) 
and two different log-normal models (dot-dashed and dotted 
curves) with three degrees of freedom each; they are treated 
in detail in Appendix ^ In order to produce a clear sepa¬ 
ration between the two populations, the best compromise is 
an inverse tangent-like function (solid line), with only two 
free parameters. The resulting functional form, as a function 



Figure 10. Blue galaxy fraction models, fb, and the correspond¬ 
ing Poisson error, as a function of the central halo mass: linear 
(dashed line), log-normal I (dot-dashed), log-normal II (dotted), 
inverse tangent (solid). The red galaxy fractions are recovered by 

fr = 1- fb- 


fb{\ogMh) = % Itan ^ 

Z TT 


log Mh - D 
10 ^ 


(38) 


/.(logM;,) = l-/,(logM;,) 

where the parameter C determines how rapidly the blue frac¬ 
tion drops and D establishes the half-width of the curve. Ap¬ 
plying Eqs. 1^ l^to the full CMASS mock catalog, we select 
the (C, D) combination that globally best fits the observed 
red and blue redshift-space auto-correlation functions, <f(s). 
The best-fit values are C = —0.50, D = 12.50, with xled — 
15.43/5do/, xhue = 6.20/5do/ and xlt = 10.82/lOdo/. 
We use these red and blue inverse tangent mocks to match 
the other two statistics, Wp{rp) and E(7r), which are shown 
in Figure im and the cross-correlation functions in Fig. 0 
The ^(s) fit is performed using the full covariance matrix 
and the uncertainties are estimated via jackknife resampling 
(Sec.lOI). 

The cross-correlations between red and blue CMASS 
galaxies behave similarly to the auto-correlation functions: 
they are stronger on small scales and weaker when the pair 
separation increases. 

These functions represent a consistency check of our 
red/blue fitting scheme and they provide robust information 
about red and blue galaxy bias: the younger and more star¬ 
forming is the galaxy, the lower are its clustering amplitude 
and bias. 

Figure displays the red and blue HOD models in¬ 
ferred by splitting the full MultiDark mock using the ob¬ 
served CMASS red/blue galaxy fraction. The lines are the 
predictions computed normalizing (iVc), (W), (W) by fb,r- 
For red galaxies the HOD shape is compatible with the 
model shown in Figure]^ confirming that the red/blue sep¬ 
aration we imposed with the galaxy fraction constraint is 
reliable for the red population. For blue mocks, the average 
number of galaxies per halo mass is ~ 10 times less com¬ 
pared to the red (A^cen), at Mh = and drops 

almost linearily (3% factor) as the halo mass increases. Such 
a trend reflecs the preference of blue star-forming galaxies 
to populate low-mass halos. 

From this analysis, we estimate the conditional prob¬ 
ability, P{Mh\G)^ that a galaxy G with a specific color is 
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Figure 11. CM ASS DRll red and blue clustering measurements versus mocks. The models are obtained by splitting the full MultiDark 
mock into its red and blue components, matching the observed CMASS red/blue galaxy fraction, In this way, we prevent the 

same mock galaxy to be either red or blue, and guarantee the reliability of the model. We find good agreement between the CMASS 
measurements and our MultiDark mocks, and confirm that red galaxies leave in more dense environments compared to the blue population. 




Figure 12. Red-blue CMASS DRll (diamonds) versus inverse tangent mock (lines) cross-correlation functions. These plots are useful 
to check the mutual behavior of the the red and the blue CMASS samples. In fact, as expected, we find that the cross-correlation of 
these galaxies lies in between their auto-correlation functions, and the size of the errorbars (computed with 200 jackknife resamplings) 
is consistent with the uncertainties on their individual clustering measurements. 


hosted by a central halo having mass Mh\ see Figure in 
As expected, the result demonstratess that CMASS early- 
type redder galaxies are associated to more massive ha¬ 
los {Mh ~ 10^^'^ /i^^M©), compared to the late-type bluer 
{Mh ~ 10^^'^/i“^Mo) companions. 

6 RESULTS 

6.1 Red and Blue A^G models 

We apply the same A, G modeling performed in Section o 
for the full CMASS sample and the MultiDark full mock 
galaxy catalog to the red and blue data samples and fb,r 
mocks, to quantify how significant their differences are at the 
level of large-scale bias and redshift-space distortions. Our 
main results are presented in Figure the top row displays 
the red and blue 5](7r) CMASS measurements (points and 
squares), versus the analytic models (dashed lines); in the 
bottom row are the results for the red and blue MD mocks 
(solid lines), versus their models (dashed curves). For both 


CMASS data and MD mocks we assume the errors are given 
by our jackknife estimate, done using 200 resamplings. All 
the model fits are fully covariant and our best estimate of 
the A, G parameters are reported in Table 

As expected, the blue CMASS galaxies are less biased 
than the red population and have lower peculiar velocity 
contribution, which results in a lower clustering amplitude. 
A similar behavior is seen in a comparison of the red and 
the blue MultiDark model galaxies, we see a similar behav¬ 
ior, suggesting that we are correctly modeling our results in 
terms of redshift-space distortions and large-scale bias. As 
previously discussed in Section n our relatively high bias 
values are due to the fact that we are selecting the high- 
redshift tail {z > 0.55) of the CMASS galaxies, for whom 
the bias is expected to be higher than the typical value re¬ 
ported by |Nuza et ^ (|2013|) , 5 ~ 2. Also, the fact that our 
analysis produces high values is due to how the 5](7r) mea¬ 
surement is built in terms of the 2PCF and to the numerical 
limitations of the A, G model. 

Figure |l^ presents the 68% and 95% covariant confi¬ 
dence regions of the A, G models for the CMASS measure- 
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Figure 13. Red and blue HOD models obtained by applying the galaxy red/blue fraction condition to the MultiDark mock catalog for 
the full CMASS population. The lines are the predictions computed by normalizing (A^c), (A^s), {Nt) by For red galaxies, the HOD 
shape is consistent with Figure^ confirming that the red/blue galaxy separation we are imposing with the satellite fraction constraint is 
reliable for the red population. For blue mocks, the expected average number of galaxies per halo mass is about 10 times less than for red 
ones at logM^ = 13.5, and drops almost linearily as the halo mass increases. This reveals that blue star-forming galaxies preferentially 
populate low-mass halos. 




Figure 14. Conditional probability that a given galaxy G with a specific color is hosted by a central halo with mass obtained from 
our red and blue independent mock catalogs (left) and applying the galaxy fraction constraint (right). In both cases, as expected, we 
find that red galaxies live in more massive halos compared to the blue ones. 


merits. The Icr blue region is spread out: due to their larger 
uncertainties, blue galaxies have less power to constrain the 
A, G values compared to the red and full CMASS popula¬ 
tions. The dots indicate the position of the best-fit models 
for the three samples. As seen in Figure |T3 red CMASS 
galaxies possess higher velocity dispersion and large-scale 
bias compared to the blue sample. 


6.2 large-scale bias 

The linear bias factor 6, defined in Eq. Q is related to the 
red-blue cross-correlation, ^x(s), by 

hr{s)hb{s) = 1 ^ 1 ^- (39) 

where the subscripts r, b indicate, respectively, red and blue 
galaxies, and ^m(s) is the dark matter correlation function. 
We then expect that the ratio (s)/ ^/^r{s)^bis) — where 
each term in the denominator is given by Eq. ^ — is close 
to unity. Eigure shows that our analysis produces a result 
that is consistent with expectations within 5%. 
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Figure 15. Top row: CMASS DRll S(7r) red (left) and blue (right) measurements and the A^G analytic models (dashed lines). Bottom 
row: fi)^r MultiDark mocks (solid curves) and their models (dashed lines). For the mocks we adopt the jackknife errors estimated for the 
blue CMASS data doing jackknife. These fits are fully covariant. From these plots we conclude that blue CMASS galaxies are less biased 
and show a lower peculiar velocity contribution compared to the red population. 



A (kms-i) 

G 

b 


Full CMASS 

384±6 

0.15T0.01 

- 3 

16.89/5dof 

Full mock 

402lg 

0 14+0-01 

’o-i^-o.02 

- 3 

36.20/6dof 

Red CMASS 

402lg 

n 1 pr-l-O.Ol 

’0-l^-0.02 

- 3 

24.00/5dof 

Red mock 

432tg° 

0.13 ±0.01 

- 3.5 

27.21/5dof 

Blue CMASS 

364^47 

’0-^l-0.04 

2 

8.14/5dof 

Blue mock 

268 ± 35 

0 lfi+0-07 
'0-1O-0.09 

- 2.8 

2.61/8dof 


Table 2. Best-fit values of the A,G parameters that model S(7r) in both full, red, blue CMASS measurements and MultiDark mocks. 
All the fits are fully covariant. The bias is computed using the approximation given in Eq. El where /? is our G parameter, see Section 
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Figure 16. 68% and 95% confidence levels of the full (solid), red 
(dashed) and blue (dotted) S(7r) CM ASS measurements shown 
in Figs, m (left panel) and ^ (top row). All the contours include 
covariances. Consistently with the size of the error bars in Figure 
E the blue contours are much less tight than the red and full 
ones. The blue CMASS galaxies are less biased and have lower 
velocity dispersion than the red and full populations. 



Figure 17. Ratio of the quantity 656 r computed using the red- 
blue cross-correlation function, over the same quantity computed 
using the red and blue auto-correlation measurements. CMASS 
data (solid) versus independent (dot-dashed) and inverse tangent 
(dashed) mocks. Compatibly with expectations, the result is con¬ 
sistent with unity within 5% and the fluctuations are Poisson 
noise. 


7 DISCUSSION AND CONCLUSIONS 

We present a qualitative analysis of the galaxy clustering 
signal as a function of color in the BOSS CMASS DRll sam¬ 
ple. Applying the color cut defined in Eq. we divide the 
full sample in a red and a blue component and compute the 
redshift-space and projected correlation functions, at small 
and intermediate scales (0.1 ^ r ^ 50 /i“^Mpc). Our mea¬ 
surements (see Section^ are consistent with previous re¬ 
sults by |Wang et~ar| (|2007|) , |Zehavi et al.| (|2005b|) , |Swanson| 
|et al.| (|2008D and confirm that blue star-forming galaxies 


preferentially populate less dense environments, compared 
to the red ones. 

In addition, we describe a new metric, E(7r), defined in 
Eq. ^ which provides robust information about nonlinear 
small-scale redshift-space distortions and large-scale bias. 
We map these results into the MultiDark cosmological simu¬ 
lation (Section |2.4|) , using a five-parameter halo occupation 
distribution model (Section |2.5|) , to generate reliable mock 
galaxy catalogs able to reproduce the observed clustering 
signal in the full, red and blue CMASS samples. 

We separately model the full (Section^, red, and blue 
(Section O) CMASS populations, building three indepen¬ 
dent mock galaxy catalogs (three different HOD models, 
with five dof each). We match our full, red and blue CMASS 
clustering measurements by empirically changing the HOD 
input parameters, until we find a set that reproduces the ob¬ 
served clustering amplitude. To simplify the task, we choose 
to vary only three parameters, specifically those values re¬ 
lated to physical quantities we want to measure: Mmin, the 
minimum host halo mass, which is connected to the galaxy 
number density, M{, governing the satellite fraction, and a, 
the slope of the satellite contribution. Our best empirical 
estimates for the independent HODs are reported in Ta¬ 
ble U and confirm that red galaxies preferentially populate 
more clustered environments, where the satellite fraction is 
higher than for blue-star forming galaxies. This HOD model 
attempt suggests that we are able to individually match the 
clustering of full, red and blue CMASS samples, with small 
variations in the input parameters. Using these independent 
mocks, we calculate the probability, P{Mh\G), that a spe¬ 
cific galaxy G is hosted by a halo with central mass Mh 
(left panel of Eigure 0 , and estimate the mean central 
halo masses of our red and blue model galaxies. We find 
Mh ~ Mq h~^, respectively for star-forming 

bluer and late-type redder galaxies, which again confirms 
that red galaxies live in more massive halos. 

The traditional HOD formulation reproduces both 
red and blue CMASS clustering; however, it is based on a 
non-physical assumption: being independent, the red and 
blue models share a certain number of mock galaxies. This 
means the same galaxy can be either red or blue, whatever 
its mass is. In order to address this problem, we modify our 
HOD assignment to be able to infer both red and blue mod¬ 
els from the full one, in such a way they are complementary 
and do not overlap. To this purpose, we split the full mock 
catalog by using an appropriate model that reproduces the 
observed CMASS red/blue galaxy fraction, fb,r (Eq. |3^. 
We test four different functional forms of fb,r (see Appendix 
^ for details), depending on a different number of parame¬ 
ters, and conclude that the best functional fb,r form is an 
inverse-tangent-like function (Eq. |^. The specific shape 
only has two free parameters, G and D, that respectively 
govern how fast the blue (red) fraction drops (grows) as 
the halo mass increases and the position of the half-width 
point of the curve. With this new HOD formulation, we 
reduce to five the number of free parameters needed to 
build red and blue models from the full mock (i.e., five 
from the full mock, plus two from the fb,r condition). Our 
main results are presented in Eigure PHI and show good 
agreement between our model galaxies and the observations. 
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We then quantify the differences in the blue and red 
populations from the point of view of the redshift-space dis¬ 
tortions and large-scale bias (Section |^. Two regimes are 
interesting to this purpose: on large scales, the gravitational 
infall of galaxies to density inhomogeneities compresses the 
two-point correlation function along the line-of-sight direc¬ 
tion; on small scales, the 2PCF experiences an elongation 
effect due to the nonlinear peculiar velocities of galaxies, 
with respect to the Hubble flow (see Sec. E3- In order to 
separate the two contributions and study the small scale 
stretching effect, we build the new metric 5](7r), defined in 
Eq. ^ as the ratio between ^(r^, tt) — averaged in the range 
0.5 ^ Tp < 2/i“^Mpc to maximize the FoG effect — and 
the best-fit spherical averaged power law to the projected 
correlation function, Wp{rp). Using this approach, we derive 
a robust prediction of the deviation of ^(r^, tt) from the real 
space behavior. To estimate the contribution of both effects, 
we model E(7r) by convolving the real-space best-fit power 
law to Wp{rp), with a peculiar velocity term, assumed to be 
a normal function (Eq. 113) and the Kaiser factor (Eq. p^ . 
The resulting model only depends on two parameters: G, 
that measures the Kaiser compression and is proportional 
to the inverse of the linear bias, 5, and A, that is the pair¬ 
wise velocity dispersion, which quantifies the FoG elongation 
effect. Fitting this A, G parametrization to our full, red, blue 
S(7r) GMASS and MD mock results demonstrates (see Ta¬ 
ble that blue galaxies are less biased than red ones and 
have a lower peculiar velocity contribution, which leads to 
a smaller clustering amplitude. 
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APPENDIX A: CLUSTERING SENSITIVITY 
ON HOD PARAMETERS 

The left column in Eigure |AI| presents our HOD model (see 
Section |2.5D as a function of three parameters: Mmin (top 
row), M[ (middle), and a (bottom); the remaining two pa¬ 
rameters are fixed to the default values given by [White et al.| 
(|2011|) : log Mo = 12.8633, aiogM = 0.5528 . The projected 
correlation functions based on these mocks are shown in the 
right column. Increasing the value of Mmin (top row, from 
lighter to darker solid lines) globally enhances the clustering 
amplitude, with a strong contribution from sub-structures 
belonging to different hosts (2-halo term). On the other 
side, the interaction between satellites belonging to the same 
central halo (1-halo term) weakens as M[ increases (bot¬ 
tom row, from lighter to darker solid lines), resulting in a 
smoother slope at scales ^ 1 h~^Mpc. The extreme case is 
achieved when log Mi =16.00, where the satellite contribu¬ 
tion becomes almost negligible, and fsat = 5.45 x 10“^ ^ 0. 


APPENDIX B: RED AND BLUE GALAXY 
FRACTION MODELS 

In addition to the inverse tangent fraction model defined 
in Eq. to mimic the red and blue galaxy fractions as a 
function of the central halo mass, we test also a linear model 

MlogMh) = -MlogM;, TV, (Bl) 

and two log-normal like functions, with three degrees of free¬ 
dom each. The first one (Logn I) is given by 

/,(logM,) = (B2) 

where 

P,,, = exp (- 

is a density function. The parameters fib,r are the blue and 
red, mean galaxy masses, respectively, and a is the log¬ 
normal width. The second version (Logn II) has fixed am¬ 
plitude cr, and a new parameter k, that controls the mutual 
heights of the red and blue peaks. We have 

where Pb^r is given by Eq. |B3[ After applying these con¬ 
straints to the full MultiDark mock catalog, we split it into 
its red and blue components. We then fit the clustering am¬ 
plitudes of our model galaxies to the GMASS red and blue 
samples. 


APPENDIX C: TESTING THE ERRORS ^ 
JACKKNIFE VERSUS QPM MOCKS 


We test our full GMASS jackknife error estimates by com¬ 
puting the ^(s), Wp{rp), and X(7r) covariance matrices from 
a set of 100 Quick Particle Mesh (QPM; [White et al.||2014|) 
mock catalogs, with slightly different cosmology: Qm = 0.29. 
Since these mocks are all independent of each other, we can 
compute their covariance as 


qQPM ^ 


1 

riQPM — 1 


“^QPM 

Y - g){Gi - G), 


6=1 


(Cl) 
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logM^ 




Figure Al. Implication of a change in the HOD input parameters (left column) on the projected correlation function (right column). 
We allow only one parameter to vary at a time: Mynim in the top row, especially affects the 2-halo term; M[ (logM^j^ = 13.00) and a, 
respectively in the middle and bottom row, have a strong effect on the 1-halo term. The resulting correlation functions are degenerate 
with respect to the variation of these three parameters. The remaining two parameters are fixed at the default values given by |White| 
leTlrl ( pfel ): log Mo = 12.8633, ^logM = 0.5528. 


where uqpm = 100, and is the mean QPM correlation 
function in the bin, 

'T^QPM 

^ ^k/^QPM- (C2) 

6=1 

Figure shows the covariant (thick lines) and the 
non-covariant (weak) A, G contours of the full, red and 
blue CMASS 5](7r) models versus QPM mocks (orange). 
The inclusion of covariances is almost negligible for the blue 
CMASS model, while it moves the full and red models to¬ 
ward smaller bias values and higher velocity dispersion val¬ 
ues, respectively. QPM contours are narrow, analogously to 


the full CMASS sample, and the inclusion of covariances in 
this case significantly moves the fit towards lower bias values 
and slightly higher velocities. 

Figure |C2| compares the normalized ^(s), rcp(rp), and 
E(7r) (respectively from left to right column) covariance ma¬ 
trices estimated using the QPM mocks (top row) and the 
jackknife re-samplings of the full, red and CMASS galaxy 
samples, to test the correlation between our observations 
at different scales. Overall, the QPM mocks show stronger 
covariances than jackknife in all three metrics. E(7r) is less 
correlated than the redshift-space and projected correlation 
functions; this is due to its dehnition, see Eq. ^ Since E(7r) 
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Figure C2. Normalized QPM (first row from the top) versus full (second row), red (third row) and blue (bottom row) CMASS jackknife 
covariance matrices for ^(s) (left column), Wp(rp) (central), and S(7r) (right), as a function of the s, rp and tt bins, respectively. We 
adopt a ten-step logarithmic binning scheme in the range 3 — 50/i“^Mpc for s, 0.1 — 35h“^Mpc for r^, and 0.1 — 40/i“^Mpc for tt. 
Overall, QPM mocks show higher covariances compared to the full, red, and blue CMASS samples, confirming the result shown in Figure 
The left column reveals that covariances become appreciable in the red and full redshift-space 2PCFs at intermediate scales (i.e., 
s ^ 8h“^Mpc), while they are almost negligible in the blue population. The red and full CMASS projected 2PCF (central column) 
are covariant at rp ^ 2h“^Mpc, while the blue case is almost covariance-free at all scales. The S(7r) measurements (right column) are 
significantly less covariant than the other two clustering statistics: QPM mocks show appreciable covariances only above tt ~ 3h“^Mpc, 
while the three CMASS samples are substantially covariance-free. 


is the ratio of two clustering measurements, both errors 
propagate in it, resulting in a smoother correlation at all 
scales. The red CMASS sample includes the majority of the 
CMASS galaxies, thus it is reasonable that its covariance 
matrices behave similarly to the ones of the full sample. The 


blue case is slightly different: errors are larger and covari¬ 
ances are almost negligible in all the three measurements, 
especially in S(7r). 
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Figure Cl. Covariant (thick contours) versus non-covariant 
(weak lines) 68% and 95% confidence levels of the A, G mod¬ 
els for the S(7r) full (black solid), red (red dotted) and blue 
(blue dashed) CMASS measurements versus QPM mocks (orange 
dashed). QPMs have slightly different cosmology: Q^rn = 0.29. 
The inclusion of covariances is almost negligible for the blue pop¬ 
ulation, and weakly appreciable in the full case. Inversely, in the 
red population, covariances slightly move the fit towards higher 
velocity values; for QPMs, this shift is significant and drives the 
contours towards smaller bias values and slightly higher velocities. 
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